On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

نویسنده

Bruno Scherrer

چکیده

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direct Policy Iteration (DPI) (Lagoudakis and Parr, 2003; Fern et al., 2006; Lazaric et al., 2010) and Conservative Policy Iteration (CPI) (Kakade and Langford, 2002). By paying a particular attention to the concentrability constants involved in such guarantees, we notably argue that the guarantee of CPI is much better than that of DPI, but this comes at the cost of a relative—exponential in 1 — increase of time complexity. We then describe an algorithm, Non-Stationary Direct Policy Iteration (NSDPI), that can either be seen as 1) a variation of Policy Search by Dynamic Programming by Bagnell et al. (2003) to the infinite horizon situation or 2) a simplified version of the Non-Stationary PI with growing period of Scherrer and Lesner (2012). We provide an analysis of this algorithm, that shows in particular that it enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a time complexity similar to that of DPI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine

Just-in-time scheduling problem on a single batch processing machine is investigated in this research. Batch processing machines can process more than one job simultaneously and are widely used in semi-conductor industries. Due to the requirements of just-in-time strategy, minimization of total earliness and tardiness penalties is considered as the criterion. It is an acceptable criterion for b...

متن کامل

A Framework for Adapting Population-Based and Heuristic Algorithms for Dynamic Optimization Problems

In this paper, a general framework was presented to boost heuristic optimization algorithms based on swarm intelligence from static to dynamic environments. Regarding the problems of dynamic optimization as opposed to static environments, evaluation function or constraints change in the time and hence place of optimization. The subject matter of the framework is based on the variability of the ...

متن کامل

Modeling and scheduling no-idle hybrid flow shop problems

Although several papers have studied no-idle scheduling problems, they all focus on flow shops, assuming one processor at each working stage. But, companies commonly extend to hybrid flow shops by duplicating machines in parallel in stages. This paper considers the problem of scheduling no-idle hybrid flow shops. A mixed integer linear programming model is first developed to mathematically form...

متن کامل

A New Mathematical Model for a Multi-product Supply Chain Network with a Preventive Maintenance Policy

The supply chain network design (SCND) implicates decision-making at a strategic level and makes it possible to create an effective and helpful context for managing. The aim of the network is to minimize the total cost so that customer's demands should be met. Preventive maintenance is pre-determined work performed to a schedule with the aim of preventing the wear and tear or sudden failure of ...

متن کامل

Mathematical Programming Models for Solving Unequal-Sized Facilities Layout Problems - a Generic Search Method

This paper present unequal-sized facilities layout solutions generated by a genetic search program named LADEGA (Layout Design using a Genetic Algorithm). The generalized quadratic assignment problem requiring pre-determined distance and material flow matrices as the input data and the continuous plane model employing a dynamic distance measure and a material flow matrix are discussed. Computa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1306.0539 شماره

صفحات -

تاریخ انتشار 2013

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

نویسنده

چکیده

منابع مشابه

Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine

A Framework for Adapting Population-Based and Heuristic Algorithms for Dynamic Optimization Problems

Modeling and scheduling no-idle hybrid flow shop problems

A New Mathematical Model for a Multi-product Supply Chain Network with a Preventive Maintenance Policy

Mathematical Programming Models for Solving Unequal-Sized Facilities Layout Problems - a Generic Search Method

عنوان ژورنال:

اشتراک گذاری